Today, you will load a filtered gapminder dataset - with a subset of data on global development from 1952 - 2007 in increments of 5 years - to capture the period between the Second World War and the Global Financial Crisis.
Your task: Explore the data and visualise it in both static and animated ways, providing answers and solutions to 7 questions/tasks below.
First, start with installing the relevant packages ‘tidyverse’, ‘gganimate’, and ‘gapminder’.
## ── Attaching packages ───────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## ── Conflicts ──────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'gganimate' was built under R version 3.6.2
First, see which specific years are actually represented in the dataset and what variables are being recorded for each country. Note that when you run the cell below, Rmarkdown will give you two results - one for each line - that you can flip between.
unique(gapminder$year)
## [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
head(gapminder)
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
gapminder <- as.data.frame(gapminder)
The dataset contains information on each country in the sampled year, its continent, life expectancy, population, and GDP per capita.
Let’s plot all the countries in 1952.
theme_set(theme_bw()) # set theme to white background for better visibility
ggplot(subset(gapminder, year == 1952), aes(gdpPercap, lifeExp, size = pop)) +
geom_point() +
scale_x_log10()
We see an interesting spread with an outlier to the right. Answer the following questions, please:
Q1. Why does it make sense to have a log10 scale on x axis?
#Answer Q1 As the differences in gdp are this big, you would not be able to see the richest countries within the plot, if you did not ‘squeeze’ the richest countries closer to the poorer one with this transformation of the x-axis.
Q2. What country is the richest in 1952 (far right on x axis)?
gapminder %>% filter(year==1952) %>% arrange(desc(gdpPercap))
## country continent year lifeExp pop gdpPercap
## 1 Kuwait Asia 1952 55.565 160000 108382.3529
## 2 Switzerland Europe 1952 69.620 4815000 14734.2327
## 3 United States Americas 1952 68.440 157553000 13990.4821
## 4 Canada Americas 1952 68.750 14785584 11367.1611
## 5 New Zealand Oceania 1952 69.390 1994794 10556.5757
## 6 Norway Europe 1952 72.670 3327728 10095.4217
## 7 Australia Oceania 1952 69.120 8691212 10039.5956
## 8 United Kingdom Europe 1952 69.180 50430000 9979.5085
## 9 Bahrain Asia 1952 50.939 120447 9867.0848
## 10 Denmark Europe 1952 70.780 4334000 9692.3852
## 11 Netherlands Europe 1952 72.130 10381988 8941.5719
## 12 Sweden Europe 1952 71.860 7124673 8527.8447
## 13 Belgium Europe 1952 68.000 8730405 8343.1051
## 14 Venezuela Americas 1952 55.088 5439568 7689.7998
## 15 Iceland Europe 1952 72.490 147962 7267.6884
## 16 Germany Europe 1952 67.500 69145952 7144.1144
## 17 France Europe 1952 67.410 42459667 7029.8093
## 18 Czech Republic Europe 1952 66.870 9125183 6876.1403
## 19 Saudi Arabia Asia 1952 39.875 4005677 6459.5548
## 20 Finland Europe 1952 66.550 4090500 6424.5191
## 21 Austria Europe 1952 66.800 6927772 6137.0765
## 22 Argentina Americas 1952 62.485 17876956 5911.3151
## 23 Uruguay Americas 1952 66.071 2252965 5716.7667
## 24 Cuba Americas 1952 59.421 6007797 5586.5388
## 25 Hungary Europe 1952 64.030 9504000 5263.6738
## 26 Ireland Europe 1952 66.910 2952156 5210.2803
## 27 Slovak Republic Europe 1952 64.360 3558137 5074.6591
## 28 Italy Europe 1952 65.940 47666000 4931.4042
## 29 Lebanon Asia 1952 55.928 1439529 4834.8041
## 30 South Africa Africa 1952 45.009 14264935 4725.2955
## 31 Gabon Africa 1952 37.003 420702 4293.4765
## 32 Slovenia Europe 1952 65.570 1489518 4215.0417
## 33 Iraq Asia 1952 45.320 5441766 4129.7661
## 34 Israel Asia 1952 65.390 1620914 4086.5221
## 35 Poland Europe 1952 61.310 25730551 4029.3297
## 36 Chile Americas 1952 54.745 6377619 3939.9788
## 37 Spain Europe 1952 64.940 28549870 3834.0347
## 38 Peru Americas 1952 43.902 8025700 3758.5234
## 39 Serbia Europe 1952 57.996 6860147 3581.4594
## 40 Greece Europe 1952 65.860 7733250 3530.6901
## 41 Ecuador Americas 1952 48.357 3548753 3522.1107
## 42 Angola Africa 1952 30.015 4232095 3520.6103
## 43 Mexico Americas 1952 50.789 30144317 3478.1255
## 44 Japan Asia 1952 63.030 86459025 3216.9563
## 45 Romania Europe 1952 61.050 16630000 3144.6132
## 46 Croatia Europe 1952 61.210 3882229 3119.2365
## 47 Nicaragua Americas 1952 42.314 1165790 3112.3639
## 48 Puerto Rico Americas 1952 64.280 2227000 3081.9598
## 49 Portugal Europe 1952 59.820 8526050 3068.3199
## 50 Hong Kong, China Asia 1952 60.960 2125900 3054.4212
## 51 El Salvador Americas 1952 45.262 2042865 3048.3029
## 52 Iran Asia 1952 44.869 17272000 3035.3260
## 53 Trinidad and Tobago Americas 1952 59.100 662850 3023.2719
## 54 Jamaica Americas 1952 58.530 1426095 2898.5309
## 55 Reunion Africa 1952 52.724 257700 2718.8853
## 56 Bolivia Americas 1952 40.414 2883315 2677.3263
## 57 Djibouti Africa 1952 34.812 63149 2669.5295
## 58 Montenegro Europe 1952 59.164 413834 2647.5856
## 59 Costa Rica Americas 1952 57.206 926317 2627.0095
## 60 Panama Americas 1952 55.191 940080 2480.3803
## 61 Algeria Africa 1952 43.077 9279525 2449.0082
## 62 Bulgaria Europe 1952 59.600 7274900 2444.2866
## 63 Guatemala Americas 1952 42.023 3146381 2428.2378
## 64 Namibia Africa 1952 41.725 485831 2423.7804
## 65 Libya Africa 1952 42.723 1019729 2387.5481
## 66 Singapore Asia 1952 60.396 1127000 2315.1382
## 67 Honduras Americas 1952 41.912 1517453 2194.9262
## 68 Colombia Americas 1952 50.643 12350771 2144.1151
## 69 Congo, Rep. Africa 1952 42.111 854885 2125.6214
## 70 Brazil Americas 1952 50.917 56602560 2108.9444
## 71 Turkey Europe 1952 43.585 22235677 1969.1010
## 72 Mauritius Africa 1952 50.986 516556 1967.9557
## 73 Paraguay Americas 1952 62.649 1555876 1952.3087
## 74 Haiti Americas 1952 37.579 3201488 1840.3669
## 75 Malaysia Asia 1952 48.463 6748378 1831.1329
## 76 Oman Asia 1952 37.578 507833 1828.2303
## 77 Morocco Africa 1952 42.873 9939217 1688.2036
## 78 Syria Asia 1952 45.883 3661549 1643.4854
## 79 Sudan Africa 1952 38.635 8504667 1615.9911
## 80 Albania Europe 1952 55.230 1282697 1601.0561
## 81 Jordan Asia 1952 43.158 607914 1546.9078
## 82 West Bank and Gaza Asia 1952 43.160 1030585 1515.5923
## 83 Tunisia Africa 1952 44.600 3647735 1468.4756
## 84 Senegal Africa 1952 37.278 2755589 1450.3570
## 85 Madagascar Africa 1952 36.681 4762912 1443.0117
## 86 Egypt Africa 1952 41.893 22223309 1418.8224
## 87 Dominican Republic Americas 1952 45.928 2491346 1397.7171
## 88 Cote d'Ivoire Africa 1952 40.477 2977019 1388.5947
## 89 Philippines Asia 1952 47.752 22438691 1272.8810
## 90 Taiwan Asia 1952 58.500 8550362 1206.9479
## 91 Chad Africa 1952 38.092 2682462 1178.6659
## 92 Cameroon Africa 1952 38.523 5009067 1172.6677
## 93 Swaziland Africa 1952 41.407 290243 1148.3766
## 94 Zambia Africa 1952 42.038 2672000 1147.3888
## 95 Somalia Africa 1952 32.978 2526994 1135.7498
## 96 Comoros Africa 1952 40.715 153936 1102.9909
## 97 Korea, Dem. Rep. Asia 1952 50.056 8865488 1088.2778
## 98 Sri Lanka Asia 1952 57.593 7982342 1083.5320
## 99 Nigeria Africa 1952 36.324 33119096 1077.2819
## 100 Central African Republic Africa 1952 35.463 1291695 1071.3107
## 101 Benin Africa 1952 38.223 1738315 1062.7522
## 102 Korea, Rep. Asia 1952 47.453 20947571 1030.5922
## 103 Bosnia and Herzegovina Europe 1952 53.820 2791000 973.5332
## 104 Ghana Africa 1952 43.149 5581001 911.2989
## 105 Sierra Leone Africa 1952 30.331 2143249 879.7877
## 106 Sao Tome and Principe Africa 1952 46.471 60011 879.5836
## 107 Togo Africa 1952 38.596 1219113 859.8087
## 108 Kenya Africa 1952 42.270 6464046 853.5409
## 109 Botswana Africa 1952 47.622 442308 851.2411
## 110 Mongolia Asia 1952 42.244 800663 786.5669
## 111 Yemen, Rep. Asia 1952 32.548 4963829 781.7176
## 112 Congo, Dem. Rep. Africa 1952 39.143 14100005 780.5423
## 113 Afghanistan Asia 1952 28.801 8425333 779.4453
## 114 Niger Africa 1952 37.444 3379468 761.8794
## 115 Thailand Asia 1952 50.848 21289402 757.7974
## 116 Indonesia Asia 1952 37.468 82052000 749.6817
## 117 Mauritania Africa 1952 40.543 1022556 743.1159
## 118 Uganda Africa 1952 39.978 5824797 734.7535
## 119 Tanzania Africa 1952 41.215 8322925 716.6501
## 120 Pakistan Asia 1952 43.436 41346560 684.5971
## 121 Bangladesh Asia 1952 37.484 46886859 684.2442
## 122 Vietnam Asia 1952 40.412 26246839 605.0665
## 123 Liberia Africa 1952 38.480 863308 575.5730
## 124 India Asia 1952 37.373 372000000 546.5657
## 125 Nepal Asia 1952 36.157 9182536 545.8657
## 126 Burkina Faso Africa 1952 31.975 4469979 543.2552
## 127 Guinea Africa 1952 33.609 2664249 510.1965
## 128 Rwanda Africa 1952 40.000 2534927 493.3239
## 129 Gambia Africa 1952 30.000 284320 485.2307
## 130 Mozambique Africa 1952 31.286 6446316 468.5260
## 131 Mali Africa 1952 33.685 3838168 452.3370
## 132 Zimbabwe Africa 1952 48.451 3080907 406.8841
## 133 China Asia 1952 44.000 556263527 400.4486
## 134 Equatorial Guinea Africa 1952 34.482 216964 375.6431
## 135 Malawi Africa 1952 36.256 2917802 369.1651
## 136 Cambodia Asia 1952 39.417 4693836 368.4693
## 137 Ethiopia Africa 1952 34.078 20860941 362.1463
## 138 Burundi Africa 1952 39.031 2445618 339.2965
## 139 Myanmar Asia 1952 36.319 20092996 331.0000
## 140 Eritrea Africa 1952 35.928 1438760 328.9406
## 141 Guinea-Bissau Africa 1952 32.500 580653 299.8503
## 142 Lesotho Africa 1952 42.138 748747 298.8462
#Answer Q2 From the above code you can see, that the maximum gdp in 1952 was 108382.35 in the country Kuwait.
You can generate a similar plot for 2007 and compare the differences
ggplot(subset(gapminder, year == 2007), aes(gdpPercap, lifeExp, size = pop)) +
geom_point() +
scale_x_log10()
The black bubbles are a bit hard to read, the comparison would be easier with a bit more visual differentiation.
Q3. Can you differentiate the continents by color and fix the axis labels?
#Answer Q3 You can differentiate between the continents by adding a color argument in the aestetics ie. color=continent. You add labels to the x- and y-axis with the labs function, in which you specify, what you want to call the x- and y-axis respectively. You can also specify the title of the legends in the labs() function. The size= argument names the size lab and the color= argument defines what the color lab is called. If you eg. had ‘shape’ as a facet in your plot, you could say shape=“title” to name that argument.
ggplot(subset(gapminder, year == 2007), aes(gdpPercap, lifeExp, size = pop, color=continent)) + #color by continent
geom_point() +
scale_x_log10()+labs(x="GDP per capita", y="Life expentancy", colour="Continent", size="Population size") #adding axis labels
Q4. What are the five richest countries in the world in 2007?
gapminder %>% filter(year==2007) %>% arrange(desc(gdpPercap)) %>% head(n=5)
## country continent year lifeExp pop gdpPercap
## 1 Norway Europe 2007 80.196 4627926 49357.19
## 2 Kuwait Asia 2007 77.588 2505559 47306.99
## 3 Singapore Asia 2007 79.972 4553009 47143.18
## 4 United States Americas 2007 78.242 301139947 42951.65
## 5 Ireland Europe 2007 78.885 4109086 40676.00
#Answer Q4 The five richest countries in 2007 were Norway, Kuwait, Singapore, United States and Ireland.
The comparison would be easier if we had the two graphs together, animated. We have a lovely tool in R to do this: the gganimate package. And there are two ways of animating the gapminder ggplot.
The first step is to create the object-to-be-animated
anim <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop)) +
geom_point() +
scale_x_log10() # convert x to log scale
anim
This plot collates all the points across time. The next step is to split it into years and animate it. This may take some time, depending on the processing power of your computer (and other things you are asking it to do). Beware that the animation might appear in the ‘Viewer’ pane, not in this rmd preview. You need to knit the document to get the viz inside an html file.
anim + transition_states(year,
transition_length = 1,
state_length = 1)
Notice how the animation moves jerkily, ‘jumping’ from one year to the next 12 times in total. This is a bit clunky, which is why it’s good we have another option.
This option smoothes the transition between different ‘frames’, because it interpolates and adds transitional years where there are gaps in the timeseries data.
anim2 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop)) +
geom_point() +
scale_x_log10() + # convert x to log scale
transition_time(year)
anim2
The much smoother movement in Option 2 will be much more noticeable if you add a title to the chart, that will page through the years corresponding to each frame.
Q5 Can you add a title to one or both of the animations above that will change in sync with the animation? [hint: search labeling for transition_states() and transition_time() functions respectively]
anim3 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color=continent)) +
geom_point() +
scale_x_log10() + # convert x to log scale
transition_states(year, transition_length = 1, state_length = 1)+labs(title = 'Year: {closest_state}')
anim3
#Answer Q5 In the above you define what one ‘transition’ is (here it is 1 step in the year variable) and you use these defined steps (in states) to make a title. I.e. you tell it that the title should be the closest ‘state’ at that point in the animation.
Q6 Can you made the axes’ labels and units more readable? Consider expanding the abreviated lables as well as the scientific notation in the legend and x axis to whole numbers.[hint:search disabling scientific notation]
options(scipen=999) #setting a value for how likely it is that scientific notation is triggered
anim4 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color=continent)) +
geom_point() +
scale_x_log10() + # convert x to log scale
transition_states(year, transition_length = 1, state_length = 1)+labs(title = 'Year: {closest_state}', x="GDP per capita ($) on a log scale", y="Life expentancy (years)", colour="Continent", size="Population size")
anim4
#Answer Q6 Yes, with the options function. With the options function you set a value for how likely it is that scientific notation (with the e’s) is triggered. If you set it to a large positive value then there is a very low chance that scientific notation is triggered.
Q7 Come up with a question you want to answer using the gapminder data and write it down. Then, create a data visualisation that answers the question and explain how your visualization answers the question. (Example: you wish to see what was mean life expectancy across the continents in the year you were born versus your parents’ birth years). [hint: if you wish to have more data than is in the filtered gapminder, you can load either the gapminder_unfiltered dataset and download more at https://www.gapminder.org/data/ ]
#Answer Q7 Question: How was the development in GDP per capita in the two nordic countries Nowrway and Denmark in the period 1952-2007?
Here I use the cowplot package to plot the two plots next to each other and the viridis package to get some pretty colors on it.
#install.packages("cowplot") # package for plotting to plots in one
#install.packages("viridis") # package for coloring
library(viridis)
## Loading required package: viridisLite
library(cowplot)
## Warning: package 'cowplot' was built under R version 3.6.2
summary(gapminder)#to get summary stats for each collumn in the gapminder data
## country continent year lifeExp
## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
## Algeria : 12 Asia :396 Median :1980 Median :60.71
## Angola : 12 Europe :360 Mean :1980 Mean :59.47
## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
## Australia : 12 Max. :2007 Max. :82.60
## (Other) :1632
## pop gdpPercap
## Min. : 60011 Min. : 241.2
## 1st Qu.: 2793664 1st Qu.: 1202.1
## Median : 7023596 Median : 3531.8
## Mean : 29601212 Mean : 7215.3
## 3rd Qu.: 19585222 3rd Qu.: 9325.5
## Max. :1318683096 Max. :113523.1
##
#using the theme() package to make pretty plots
plot_1 <- gapminder %>% filter(country=="Denmark") %>% ggplot(., aes(x=year, y=gdpPercap)) + geom_point(aes(color=gdpPercap), show.legend = F)+labs(x="Year", y="GDP per capita", title="Economic development in Denmark", subtitle="1952-2007") + theme(plot.title = element_text(size=10), plot.subtitle=element_text(size=8), axis.title = element_text(size=8)) + scale_color_viridis(option = "D")+theme_minimal()
plot_2 <- gapminder %>% filter(country=="Norway") %>% ggplot(., aes(x=year, y=gdpPercap)) + geom_point(aes(color=gdpPercap), show.legend = F)+labs(x="Year", y="GDP per capita", title="Economic development in Norway", subtitle="1952-2007") + theme(plot.title = element_text(size=10), plot.subtitle=element_text(size=8), axis.title = element_text(size=8)) + scale_color_viridis(option = "D")+theme_minimal()
plot_grid(plot_1, plot_2, labels="AUTO")
getwd()
## [1] "/Users/astridrybner/Documents/Cultural data science/au611689_rybner_astrid"